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Fuzzy set representation of a prior 
distribution 

Glen Meeden* 1 
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Abstract: In the subjective Baycsian approach uncertainty is described by a 
prior distribution chosen by the statistician. Fuzzy set theory is another way 
of representing uncertainty. Here we give a decision theoretic approach which 
allows a Bayesian to convert their prior distribution into a fuzzy set member- 
ship function. This yields a formal relationship between these two different 
methods of expressing uncertainty. 
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1. Introduction 

For a subjective Bayesian uncertainty about the unknown parameter or state of na- 
ture can be expressed through a prior distribution. If 9 denotes a typical parameter 
value and O the set of all possible parameter values then the prior distribution over 
summarizes their knowledge and beliefs about the parameter. 

Fuzzy set theory, introduced in Zadeh [7], is another approach to representing 
uncertainty A fuzzy set A, a subset of 0, is characterized by its membership func- 
tion. This is a function defined on whose range is contained in the unit interval. 
At a point 9 the value of the membership function is a measure of how much we 
think 9 belongs to the set A. Statisticians have been slow to embrace fuzzy set 
theory Tahcri [6] gives a review of applications of fuzzy set theory concepts to 
statistical methodology. Bayesians have shown less interest in fuzzy ideas than fre- 
qucntists. Singpurwalla and Booker [5] have proposed a model which incorporates 
fuzzy membership functions into a subjective Bayesian setup. However, they do not 
give membership functions a probabilistic interpretation. In the imprecise or vague 
approach to Bayesian statistics a decision maker selects a family of possible prior 
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distributions to represent their prior beliefs, de Cooman [2] presents an uncertainty 
model for vague probability assessments that is closely related to Zadeh's approach 
[7]- 

The concept of confidence intervals is a frequentist approach to expressing un- 
certainty about an unknown parameter given data. It has long been recognized that 
naive users have difficulty interpreting confidence intervals. They have a tendency 
to give a probabilistic interpretation to the observed confidence interval. 

It has also long been known that for discrete data conventional confidence inter- 
vals, which we will also call "crisp" confidence intervals, using a term from fuzzy 
set theory, can perform poorly. A recent article [1] reviews the problems with crisp 
confidence intervals for binomial models. Because of the inherent flaws in crisp con- 
fidence intervals for discrete problems a new confidence interval notion has been 
suggested called fuzzy confidence intervals [3], Given the data a fuzzy confidence 
interval is just the membership function of the set of plausible or reasonable values 
for 9. One way to think about such membership functions is that they are gen- 
eralizations of randomized intervals where no randomization is ever implemented. 
They argued that fuzzy confidence intervals overcome the difficulties of the usual 
crisp intervals for discrete probability models. 

In terms of frequency of coverage discrete data Bayesian credible intervals will 
suffer from the same problem that conventional intervals do. This should be of 
concern to objective Bayesians who want their intervals to have good frequentist 
properties. One way to approach this problem is to find a method that allows them 
to use their posterior to get a sensible fuzzy interval instead of the usual Bayesian 
credible interval. 

Here we consider a no data statistical decision problem where the set of possible 
decisions is the class of all membership functions defined on O. We then define a 
family of loss functions. These functions measure the loss incurred when a probabil- 
ity distribution is replaced by a fuzzy membership function. For any loss function 
in the family and a given prior distribution we solve the resulting no data decision 
problem. This gives a method for converting a prior or posterior into a fuzzy mem- 
bership function. For a given fuzzy membership function we also study the problem 
of identifying the family of prior distributions whose common solution to the no 
data decision problem is this function. This sets up a formal relationship between 
the two theories. 

2. Fuzzy set theory 

We will only use some of the basic concepts and terminology of fuzzy set theory, 
which can be found in the most elementary of introductions to the subject [4]. 

A fuzzy set A in a space O is characterized by its membership function, which is 
a map I a ■ © — * [0, 1]. The value Ia(&) is the "degree of membership" of the point 9 
in the fuzzy set A or the "degree of compatibility . . . with the concept represented 
by the fuzzy set." See ([4], p. 75). The idea is that we are uncertain about whether 
9 is in or out of the set A. The value Ia(6) represents how much we think 9 is in 
the fuzzy set A. The closer Ia{9) is to 1.0, the more we think 6 is in A. The closer 
Ia{9) is to 0.0, the more we think 9 is not in A. 

A fuzzy set whose membership function only takes on the values zero or one is 
called crisp. For a crisp set, the membership function I a is the same thing as the 
indicator function of an ordinary set A. Thus "crisp" is just the fuzzy set theory 
way of saying "ordinary," and "membership function" is the fuzzy set theory way of 
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saying "indicator function." The complement of a fuzzy set A having membership 
function I a is the fuzzy set B having membership function Ig = 1 — I a- 

If I a is the membership function of a fuzzy set A, the "{-cut of A ([4], Section 5.1) 
is the crisp set 

-<Ia = {9 : Ia(0) > 7}- 

Clearly, knowing all the 7-cuts for < 7 < 1 tells us everything there is to know 
about the fuzzy set A. The 1-cut is also called the core of A, denoted core(A) and 
the set 

supp(A)= |J 7J A = {0 : Ia(0)>0} 

is called the support of A ([4], p. 100). A fuzzy set is said to be convex if each 7-cut 
is convex ([4], pp. 104-105). 



3. A decision problem 



For simplicity we assume that is an interval of real numbers and the prior n is a 
continuous probability density function defined on it. 

Let A be the class of all measurable membership functions defined on 9. Then 
A is the space of possible decisions or actions with a typical member denoted by A. 
Given a prior density 1 on we want to find the membership function or fuzzy set 
A which best represents tt. We do this by defining a loss function and then solving 
the no data decision problem. 

Our loss function will depend on four known parameters which are specified by 
the statistician. They are a\ > 0, 02 > 0, b\ > and 62 > where at least one of 
the di's and at least one of the b^s must be strictly positive. Then the loss incurred 
when action A is taken and 9 is the true state of nature is given by 



(1) L(A,9) = ai {l-lA{9)} + f{l-I A (e)y + J^b 1 I A (9) + ^(I A (9)yjd9. 

To understand this loss function remember that we want to find the fuzzy set 
or membership function A which best represents the set of sensible or reasonable 
parameter values under our prior tt. Hence if 9 is the true parameter point we 
want Ia(9) to be close to 1. This explains the presence of the first two terms in 
equation (1). But on the other hand we do not want the fuzzy set to be too large. 
This is controlled by the last term in the equation which is a measure of the overall 
size of the fuzzy set. 

We now find the solution for this no data decision problem. 

Theorem 1. Let tt{9) be a prior density on 0. Then for the loss function of 
equation (1) the fuzzy set membership A which satisfies 

[ L(A,6)n(9)d9 = inf / L(A', 6)-k{9) d9 
Je A'eAj e 

is given by 

f0, for < tt(0) < 6i/(ai +a 2 ), 

(2) lA(9)=\ {ai Zt) { X bl > f° r bx/(ai + a 2 )<n(9)<(b 1 + b 2 )/ ai , 

ll, for n{9)> (b 1 + b 2 )/a 1 . 
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Proof. Note that we can write 

J L(A', 6)-k{6) d6 = - Ia>{6)} + y {1 - I a- (0)} 2 }A0) + 

so that to find the solution it is enough to minimize the integrand of the previous 
equation for each fixed value of 6. But for a fixed 8 the integrand is just a quadratic 
function of Ia 1 (0) and a simple calculus argument completes the proof. □ 

The theorem remains true when a\ = if we assume dividing by zero yields 
infinity. 

Note that the solution is unchanged if the loss function is multiplied by a positive 
number. Without loss of generality we could set one of the four parameters defining 
the loss function equal to one but having four parameters will be convenient in the 
following discussion. 

As with any decision problem the solution depends strongly on the loss function. 
We believe our family of loss functions is flexible and captures some of the important 
aspects of the problem. Finding a good fuzzy set to summarize our information 
about a parameter is much like finding a good credible set. We want it to include 
the likely values but without it getting to large. The loss function in equation (1) 
is essentially the sum of two quadratic functions. The first part is quadratic in 
non-membership in the set of likely values while the second part is quadratic in 
a measure of the size of the set. If we just include the linear terms in each part 
then the optimal solution will always be a crisp set. It is necessary to include the 
quadratic terms to get a true fuzzy set as a solution. 

We see from equation (2) that the optimal membership function is related to the 
prior 7r in a sensible fashion. The solution is 1 where the prior is large, where the 
prior is small and a rescaling between the two cases. Note that for a given bounded 
7r if b\ is chosen large enough then the solution to our decision problem is the 
membership function which is identically zero. On the other hand if 7r is bounded 
away from zero and a\ is chosen large enough then the solution to our decision 
problem is the membership function which is identically one. 

4. Relating priors and fuzzy sets 

We have considered the problem of converting a prior distribution into a fuzzy 
membership function. In some situations it could be of interest to be able to move 
in the other direction. That is, transform the uncertainty expressed in a fuzzy 
membership function into the Bayesian paradigm. One way to do this would be to 
find a loss function and prior for which the solution to our decision problem is the 
fuzzy membership function in hand. This suggests the following three questions. 

• For a specified fuzzy membership function, I a, and a specified loss function 
does there exist a prior density function for which the solution to our decision 
problem is Ijfi 

• For a specified fuzzy membership function, I a, does there exist a loss function 
and a prior density function for which the solution to our decision problem is 
/a? 

• If a solution does exist for question 1 is it unique? 
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We see from equation (2) that for I a to be a solution for it we must have 

(3) 7r(fl) = ^ + H T ta\\ for 8 where < I A {8) < 1 

oi + a 2 (l - Ja(9)J 

From this we see that the answer to our first question is no. This is because when 
© is unbounded tt in the previous equation need not be integrable and even when it 
is it need not integrate to one. The answer to the second question is yes whenever 
I A {&) is integrable. Since in this case we can always select b\ > and 62 > to 
make tt(8) of equation (3) a density. When a solution exists it need not be unique. 

For a simple example we set a 2 = and let the other three parameters be 
positive. Consider the special case where is bounded. If we set 

(4) fx = bi/ai and r 2 = (61 + b 2 )/a 1 
we find that 

(5) ai = b 2 /(r 2 - n) and bi = ri/(r 2 - r x ) 
and the solution from equation (2) has the form 

'0, for O<7r(0)<n, 

{ (7r(0)-ri)/(r 2 -ri), for n < tt(0) < r 3j 
1, for 7r(0) > r 2 . 

Now let /a be given and assume that the length of 8 is £. If r*i < 1/i then there 
exist a unique r 2 > ri such that 

(7) 7rA,ri(fl) = (r2-ri)/A(fl)+ri for 8 e 6 

is a prior distribution over 0. Moreover we can find values for a\, b\ and 62 which 
satisfy equation (4). With this loss function I a will be the solution to our decision 
problem when the prior is n Ari . Furthermore if the sets where I A (0) — and 
I A (0) = 1 each have positive Lebcsguc measure then it will not be the unique prior 
with this property. Any prior density 7r satisfying 

7r(0) < n when I A {6) = 0, 

(8) ir(6) =7r A>ri (0) when < / A (f) < 1, 
tt(0) > r 2 when / A (6») = 1 



will also be a solution for our decision problem. 

Among the set of possible solutions the one in equation (7) has two nice proper- 
ties. First of all it is continuous whenever Ia{9) is continuous. Secondly it treats the 
members of {8 : I A (0) = 1} similarly and the members of {8 : Ia{8) = 0} similarly. 
More importantly this identification of a fuzzy membership function with a class 
of prior distributions demonstrates that we can give roughly equivalent expressions 
of uncertainty in the Bayesian and fuzzy paradigms. 

Finally, we address the question of uniqueness. The previous discussion indicates 
that if we want uniqueness we should consider membership functions which never 
take on zero or one as a possible value. Let I a be such a membership function and 
let a\ > and a 2 > be fixed and suppose is the unit interval. Then integrating 
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equation (3) we have 

f Tr(9)d6 = bi [ ^ —- dO + b 2 [ — t^- d8 

Jo Jo a 1 +a 2 {l-I A (0)) Jo a x + a 2 (l - I A {9)) 

= bici(a 1 ,a 2 ) + b 2 c 2 (a 1 ,a 2 )- 

Hence n will be a probability density function whenever 

61 e [0, l/ci(ai,a 2 )] and b 2 = (1 - &ici(ai, a 2 ))/c 2 (ai, a 2 ). 

To better understand the relationship between I a and its corresponding prior 
we consider a simple example. Let 

(9) I A (0) = 6.0 75 (9 2 (1 -6) for 0e [0,1]. 

We consider two different choices of the a^s and for each case two different choices 
of 61. For the first case a\ = 1 and a 2 = 7. The maximum possible value for b\ is 3.40 
and our two choices for the b^s are 61 = 0.01, b 2 = 5.15 and b\ = 3.35, b 2 = 0.072. 
In the second case a\ = 4 and a 2 = 2. The maximum possible value for b\ is 4.91 
and our two choices for the fo^'s are b\ = 0.01, b 2 = 9.02 and b\ = 4.50,62 = 0.76. 
For each of the four combinations we found the unique prior whose solution to 
the decision problem yields the fuzzy membership function of equation (9). The 
membership function along with the four priors are shown in the figure. 

The membership function is the solid curve. The two curves with the two largest 
maximums are the solutions for the first case where a\ = 1 and a 2 = 7. Of the two 
solutions the one with b\ = 0.01 has the largest maximum. The other two curves are 
the solutions for the second case. Again the solution for b\ = 0.01 has the largest of 
the two maximums. These curves demonstrate what a closer inspection of equation 
(3) yields. For a fixed a± and a 2 the solution becomes less concentrated about its 
mode as b\ increases from zero to its maximum value. Also the solution becomes 
less concentrated about its mode as we increase a\ and decrease a 2 . But in all cases 
the priors do reflect the shape of their common solution. 

An interesting consequence of this unique correspondence is that it gives a way 
to update a large class of fuzzy membership functions given data. Suppose an 
expert has selected a fuzzy membership function to represent their uncertainty. 
The statistician then selects appropriate values of the a^s and the b^s and uses 
equation (3) to transform it into a prior. Then given the data they find the posterior 
distribution which is then converted back to a fuzzy membership function using the 
theorem with the and bi values. 

This result is somewhat surprising since a fuzzy membership function must sat- 
isfy less conditions then a probability density function since it need not be inte- 
grablc. At first glance the previous example where a membership function corre- 
sponded to a family of priors seems more reasonable. To get the unique correspon- 
dence, however, we made two fairly strict assumptions. The function in equation 
(3) needed to be intcgrablc and the range of the membership function had to lay 
in the open unit interval. Both these conditions on the membership function seem 
not so surprising if we hope to convert it to a probability density function. 

5. Some final remarks 

Mainline statistics has shown little interest in fuzzy set theory. This is especially 
true for most Bayesians since they believe that they already have a good way to 
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Fig 1. A plot of the fuzzy membership function (the solid line) in equation (9) and four priors 
whose common solution for four loss functions is the fuzzy membership function. The two priors 
for the ai = 1 and 02 = 7 case are the ones with the two largest maximums. The other two priors 
are for the two a% = 4 and 02 = 2 cases. 

express uncertainty. Here we have argued that Baycsians should be more interested 
in fuzzy set theory. For discrete data, just as for frequcntists, there are certain 
advantages to considering interval estimates as fuzzy sets. We noted that our scheme 
for converting a prior density into a fuzzy membership function could also be used 
to relate some fuzzy membership functions to prior densities. In some cases a fuzzy 
membership function will correspond to a family of densities while under more 
restricted conditions it will correspond to a unique density. The relationship seems 
intuitively sensible and as far as we know it is the first simple formal correspondence 
between the two theories which until now have lived in different worlds. 

A copy of Gcyer and Meeden [3] and related material can be found at 
http / / : www. stat .umn.edu / ~ glen /papers/. 
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